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It is commonly acknowledged that V-functionals with an unbounded kernel are not Hadamard 
differentiable and that therefore the asymptotic distribution of U- and V-statistics with an un- 
bounded kernel cannot be derived by the Functional Delta Method (FDM). However, in this 
article we show that V-functionals are quasi-Hadamard differentiable and that therefore a mod- 
ified version of the FDM (introduced recently in (J. Multivariate Anal. 101 (2010) 2452-2463)) 
can be applied to this problem. The modified FDM requires weak convergence of a weighted 
version of the underlying empirical process. The latter is not problematic since there exist sev- 
eral results on weighted empirical processes in the literature; see, for example, (J. Econometrics 
130 (2006) 307-335, Ann. Probab. 24 (1996) 2098-2127, Empirical Processes with Applications 
to Statistics (1986) Wiley, Statist. Smica 18 (2008) 313-333). The modified FDM approach has 
the advantage that it is very flexible w.r.t. both the underlying data and the estimator of the 
unknown distribution function. Both will be demonstrated by various examples. In particular, 
we will show that our FDM approach covers mainly all the results known in literature for the 
asymptotic distribution of U- and V-statistics based on dependent data - and our assumptions 
are by tendency even weaker. Moreover, using our FDM approach we extend these results to 
dependence concepts that are not covered by the existing literature. 

Keywords: Functional Delta Method; Jordan decomposition; quasi-Hadamard differentiability; 
stationary sequence of random variables; U- and V-statistic; weak dependence; weighted 
empirical process 

1. Introduction 

For a distribution function (d.f.) F on the real line, we consider the characteristic 

U(F):= J J 9 {x 1 ,x 2 )dF{x 1 )dF(x 2 ) (1) 
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with g : M 2 — > R some measurable function, provided the double integral exists. A sys- 
tematic theory for the nonparametric estimation of U(F) was initiated in [14] and [27]. 
A natural estimator for U(F) is given by 

U(F n ):=J J g(x 1 ,x 2 )dF n (x 1 )dF n (x 2 ), (2) 

where F n denotes some estimate of F based on the first n observations of a sequence 
X±,X2,... of random variables (on some probability space (f2,.F, P)) being identically 
distributed according to F. Sometimes U(F n ) is called von-Mises-statistic (or simply V- 
statistic) with kernel g. If F n is the empirical d.f. F n := i 2~Z"=i $-[x t ,oo) of Xi, . . . ,X n , 
then we obtain 



^ n n 

i=l j=l 

and we note that U(F n ) is closely related to the U-statistic 

1 n n 

TtE E s(**>*j)- ( 4 ) 

V ; i=l j=l:jVi 

If Xx,...,X n are i.i.d., then U n is an unbiased estimator whereas U(F n ) is generally 
not so. However, U n and U(F n ) typically share the same asymptotic properties; cf. Re- 
mark 2.5 below. Also notice that, in the nonparametric setting, U n is the minimum 
variance unbiased estimator of U(F) = E[g(Ai, X 2 )] whenever X\, . ■ . ,X n arc i.i.d. For 
background on U-statistics see, for instance, [5, 7, 14, 16, 20, 21, 23]. 

We note that several features of a d.f. F can be expressed as in (1), for instance, the 
variance of F , or Gini's mean difference of two independent random variables with d.f. F: 
for details, see Section 3. 

Our objective is the asymptotic distribution of U(F n ), that is, the weak limit of the 
empirical error y/n(U(F n ) — U(F)). In the existing literature, the starting point for the 
derivation of the asymptotic distribution of U-statistics U n is usually the Hocffding de- 
composition [14] of U n . Using this decomposition, asymptotic normality of U n was shown 
in [14] for i.i.d. sequences, in [19] for *-mixing stationary sequences, in [8, 31] for /3-mixing 
stationary sequences, in [10] for associated random variables, and recently in [6] for ad- 
mixing stationary sequences (recall from [3], page 109: i.i.d. =>■ ^-mixing =>■ /3-mixing => 
a-mixing). Another approach is based on the orthogonal expansion of the kernel g; see, 
for example, [9] and the references therein. 

In this article, we derive the asymptotic distribution of U- and V-statistics by means 
of a Functional Delta Method (FDM). The use of an FDM is known to be beneficial 
for the following reason. Provided the functional U can be shown to be Hadamard dif- 
ferentiable at F, it is basically enough to derive the asymptotic distribution of F n to 
obtain the asymptotic distribution of U(F n ). Therefore, this method is especially useful 
for deriving the asymptotic distribution of the estimator U(F n ) based on dependent data, 
because - given the Hadamard differentiability - one "only" has to derive the asymptotic 
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distribution of F n based on data subjected to a certain dependence structure. There are 
already several respective results on the asymptotic distribution of F n based on depen- 
dent data in the literature (e.g., [4, 24, 30]), and new respective results (combined with 
the assumed Hadamard differentiability) would immediately yield also the asymptotic 
distribution of U(F n ). 

However, one has to be careful with the application of an FDM to our problem. The 
classical FDM in the sense of [12, 13, 18] (see also [28, 29]) cannot be applied to many 
interesting statistical functionals depending on the tails of the underlying distribution, 
because the method typically relies on Hadamard differentiability w.r.t. the uniform 
sup-norm. For instance, as pointed out in [28] and [22], whenever F has an unbounded 
support Hadamard differentiability w.r.t. the uniform sup-norm can be shown neither 
for an L-statistic with a weight function having one of the endpoints (or both endpoints) 
of the closed interval [0, 1] in its support nor for a U-statistic with unbounded kernel. 
However, in [2] a modified version of the FDM was introduced which is suitable also for 
nonuniform sup- norms (imposed on the tangential space only), and it was in particular 
shown that this modified version can also be applied to L-statistics with a weight function 
having one of the endpoints (or both endpoints) of the closed interval [0, 1] in its support. 
In contrast to the classical FDM, our FDM is based on the notion of quasi-Hadamard 
differentiability and requires weak convergence of the empirical process y/n(F n — F) w.r.t. 
a nonuniform sup-norm, that is, in other words, weak convergence of a weighted version 
of the empirical process. Fortunately, the latter is not problematic, because there are 
many results on the weak convergence of weighted empirical processes in the literature; 
see [26] for i.i.d. data, and [4, 24, 30] for dependent data. 

In the present article, we demonstrate that the modified version of the FDM can be 
applied to derive the limiting distribution for U- and V-statistics with an unbounded 
kernel g. For simplicity of notation, we restrict the derivations to kernels of degree 2. 
However, in Remark 4.2, we clarify how the results can be extended to kernels of degree 
d > 3. Using our FDM approach, we will be able to a great extent to recover the results 
mentioned above (the conditions imposed by our approach will turn out to be weaker 
by tendency) and to extend them to other concepts of dependence; cf. Section 3.2. The 
FDM approach will also turn out to be useful when the empirical d.f. is replaced by 
a different estimate of F, for instance by a smoothed version of the empirical d.f.; cf. 
Example 3.4. 

The remainder of this article is organized as follows. In Section 2, we state the condi- 
tions under which the asymptotic distribution of U- and V-statistics can be derived by 
the modified version of the FDM and present our main result. The conditions imposed 
can be divided into two parts: on the one hand conditions on the kernel g and the d.f. F, 
and on the other hand conditions on an empirical process. In Section 3, we give several 
examples for both, that is, for kernels g and d.f. F as well as empirical processes ful- 
filling the conditions imposed. In the Appendix A, we recall the Jordan decomposition 
of functions of locally bounded variation, which will be beneficial for our applications 
in Section 3. Finally, in the Appendix B we give an intcgration-by-parts formula and 
a sort of weighted Hclly-Bray theorem. Both results are needed in Section 4 to show 
quasi-Hadamard differentiability of V-functionals. 
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2. Main result 

Our main result is Theorem 2.3 below, which provides a CLT for the V-statistic U(F n ) 
subject to Assumption 2.1. Let Da be the space of all cadlag functions ip on R with 
HV'IU < 00 ) where \\ip\\\ '■= ||V"^x||oo refers to the nonuniform sup- norm based on the 
weight function (p\(x) := (1 + \x\) x , for A € R fixed. As usual, we let • oo := 0. If A > 0, 
then we equip ED a with the a-algebra T>\ := 2?n©A to make it a measurable space, where V 
is the cr-algebra generated by the usual coordinate projections tt x : D — > R, with D 

the space of all bounded cadlag functions on R. Further, let BVi oc be the space of all 
functions ip : R — > R being real- valued and of local bounded variation on R. For ip <G BVi oc , 
we denote by dip + and dip~ the unique positive Radon measures induced by the Jordan 
decomposition of ip (for details, see the Appendix A), and we set |dV>| := dip + + dip~ . 
Finally, we will interpret integrals as being over the open interval (—00,00), that is, 

J J {— 00,00) ' 

Assumption 2.1. We assume that for some A > A' > the following assertions hold: 

(a) For every X2 G R fixed, the function g X2 {') := 9{'i x 2) lies in BVi oc H B_a'- More- 
over, the function X2 1— > /0_A(^i)|d.g X2 |(a;i) is measurable and finite w.r.t. ||-||-A'- 

(b) The functions <?i.f(') := f g(', ^2) dF(x2) and ff2,F(") : = / g( x i, •) dF(xi) lie in 
BVioc H D, and J^_A(a:)|dgij i i?|(a;) < 00 /or i = 1,2. Moreover, the functions 
gJ^(-):= /| i g(-,x 2 )|di r (a;2) and~gxF{-)\= J \g(xi, -)| dF(xi) lie in D_a' • 

(c) F is continuous, the double integral in (1) exists, and J <p\t (x) dF(x) <oo. 

(d) F n : Q — > D is (F,T>) -measurable, and every realization of F n is nonnegative and 
nondecr easing, has variation bounded by 1, the double integral in (2) exists and 
J (j)\'(x)dF n (x) < 00, for every neM. 

(e) The process \/n(F n — F) is a random element of (Ux,D\) for all n £ N, and there 
is some random element B° of (Ba,£>a) with continuous samples such that 

Vn~{F n -F)AB° in(B A> X> A> ||.|U). (5) 

The assumptions (a) and (b) will allow us to prove quasi-Hadamard differentiability 
of the functional U (defined in (1)) at F; see Section 4. At first glance, they seem to be 
awkward but in an application their verification is often straightforward, see Section 3.1. 
To understand the meaning of conditions (a) and (b), let us suppose that we want to 
derive the asymptotic distribution of U- and V-statistics by means of the classical FDM 
in the sense of [12, 13, 18]. Then we would have to prove Hadamard differentiability of 
the functional U given by (1) at F. If F has an unbounded support this could be done 
by imposing Assumptions 2.1(a) and (b) with A' = 0, that is, with the uniform sup-norm. 
Thus, as pointed out in the Introduction, an application of the classical FDM for the 
derivation of the asymptotic distribution of U- and V-statistics would, inter alia, require 
a uniformly bounded kernel g (cf. [22]). On the other hand, the modified FDM only 
requires that this boundedness holds w.r.t. the weaker nonuniform sup-norm || • ||_a' for 
some A' > 0. 
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Remark 2.2. Notice that 

(a) ' Assumption 2.1(a) could, alternatively, be imposed on g Xl denned similar as g X2 . 

Further notice that the second requirement in Assumption 2.1(a) is rather 
weak. Indeed: In the examples to be given in Section 3.1 the function x 2 H> 
J <f)-\{xi)\6.g X2 \(x\) even lies in D. 

(b) ' The last part of Assumption 2.1(b) implies gi.F,g2,F £ D-A'- 

(c) ' Continuity of F is required for the application of the modified FDM. 

(d) ' Assumption 2.1(d) is always fulfilled if F n is the empirical d.f. F„ . 

(e) ' Assumption 2.1(e) does not require that F lies in Oa or that F n is a random 

element of (P\,T>\). These conditions would actually fail to hold. 

Theorem 2.3. Under Assumption 2.1, we have 

MU(F n )-U(F))^U(B°) m(R,B(M),H) (6) 



U F (B°) :=- / B°(x)dg hF (x)~ / B°(x)dg 2 ^)- (7) 



with 



Proof. First of all, notice that the integrals in (7) exist by Assumptions 2.1(b) and (e). 
Now, let BVi t d be the space of all cadlag functions in BVi oc with variation bounded by 1, 
and U be the class of all nonncgative and nondecrcasing functions / £ BVi^ for which 
the integral on the right-hand side of equation (8) below and the integral J 4>\>(x) df(x) 
exist. We define a functional U : U — > R by setting 

U(f):= [ fg(x 1 ,x 2 )df(x 1 )df(x 2 ), fell, (8) 



so that U(F) and U(F n ) defined in (l)-(2) can be written as U(f) with / := F and 
/„ := F n , respectively. We are going to apply an FDM to the functional U . The version 
of the FDM we need for our purposes is given in [2], Theorem 4.1. It is based on the 
notion of quasi-Hadamard differentiability which is also introduced in [2], Definition 2.1. 

Let C\ be the space of all continuous functions in Ba , and notice that C\ is separable 
w.r.t. || • | \. For every / in f/'s domain U we define a functional Uf : Ca — > R by setting 

Uf(v) ■= -J v{x)dg X j{x) - J v(x)dg 2 j(x), weCj, (9) 

where gij is defined analogously to (cf. Assumption 2.1(b)). Lemma 4.1 below 
shows that, subject to Assumption 2.1(a)-(c), the functional U is quasi-Hadamard dif- 
fcrcntiablc at / := F tangcntially to Ca(E)a) with quasi-Hadamard derivative Up- Thus, 
assumption (iv) of Theorem 4.1 in [2] (with / = U, V f = U, (V',|| • ||v) = (R, | • |), 
( v o, II ■ il v ) = II ■ II a), Co = C A , 9 = F and T n = F n ) is fulfilled. Therefore, the state- 
ment of Theorem 2.3 would follow from the FDM given in Theorem 4.1 in [2] if we could 
verify that also the conditions (i)-(iii) of this theorem are satisfied. Conditions (i) and (ii) 
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are satisfied by Assumption 2.1(d) and (e), respectively. It thus remains to verify (hi), that 
is, that the mapping ui >-> U(W(ui) + F) is (J 7 , £>(IR))-measurable whenever W is a mea- 
surable mapping from some measurable space (f2, F) to (Da, T>\) such that W(ui) + F E U 
for all w € fl. Since W is (.F,X>A)-measurable and T>\ is the projection er-ficld, we obtain 
in particular (J 7 , £J(R))-measurability of ui \- > W(x,uj) for every Along with the 

representation (1), this yields (J r ,B(R))-measurability of uj M- U(W(ui) + F). □ 

We emphasize that Theorem 2.3 is quite a flexible tool to derive the asymptotic dis- 
tribution of the plug-in estimate U(F n ). In fact: Apart from checking the technical As- 
sumptions 2.1(a)-(d), it is enough to establish the CLT (5) for F n in order to obtain the 
CLT (6) for U(F n ). Section 3 below demonstrates this flexibility by various examples. 

Remark 2-4- If B° in Theorem 2.3 is a Gaussian process with zero mean and measurable 
covariance function V and if J J T(x, y) Agi y F(x) Agj^iy) exists for every i, j € {1, 2}, then 
the random variable Uf(B°) defined in (7) is normally distributed with mean and 
variance 

a2: =EE/ /r(x,y)d 5i , F (x)d«7 i)F (y). (10) 

Remark 2.5. If E[|g(Ai,Xi)|] < oo (in Examples 3.1 and 3.2 below we even have 
g(x,x) = for all x S R), then the particular V-statistic U(F n ) and the U-statistic U n 
(defined in (3) and (4), resp.) have the same asymptotic distribution. To see this, we first 
of all note that (for n > 2) 

V^(C„ - U(F)) 

= V^(U n - U(F n )) + V^(U{F n ) - U(F)) 

= -^-U{F n ) -^— j^g^X,) + MU(F n ) - U(F)) 
n — 1 n(n — 1 * — ' 

1 i=l 

=: Si(n) - S 2 (n) + V^(U(F n ) - U(F)). 

As y/ri(U(F n ) — U(F)) converges weakly to some nondegenerate limit, we obtain by 
Slutzky's lemma that Si(n) = -^jVn~(U{F n ) - U{F)) + ^U(F) converges in proba- 
bility to zero. Further, by the Markov inequality we know that, for every e > fixed, 
P[|S , 2(7i)| > e] is bounded above by -E[|S , 2(n)|] which, in turn, is bounded above by 

■£0£jE[\g(Xi, Ai)|]. So we also have that S^n) converges in probability to zero. Slutzky's 
lemma and (11) thus imply that s/n(U n — U(F)) has indeed the same limit distribution 
as ^E{U{F n )-U(F)). 

Remark 2.6. The linear part of the Hoeffding decomposition of U n — U(F) (cf. [23], 
page 178) multiplied by \fn can be written as $Z i=1 / 9%,f d(y/n(F n — F)), for example, 
using the integration-by-parts formula (22), as — X)»=i / V^i^n — F)^9i,F- Then, if we 
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could show that the degenerate part of U„ converges in probability to zero (which is 
nontrivial for dependent data), we could recover (6) with U n in place of U(F n ) by using (5) 
and the Continuous Mapping theorem. 

3. Examples 

In this section, we give some examples for g, F and F n satisfying Assumption 2.1. At first, 
in Section 3.1, we provide examples for g (and F) satisfying Assumptions 2.1(a)-(b). 
Thereafter, in Section 3.2, we will give examples for F n (and F) satisfying Assump- 
tions 2.1(d)-(e) for various types of data. We assume throughout this section that As- 
sumption 2.1(c) is fulfilled because its meaning is rather obvious and the conditions 
imposed by it are fairly weak. 

3.1. Examples for g 

In [1], one can find a number of examples for kernels g for which U(F) corresponds 
to a popular characteristic of F. By means of two popular examples, we now illustrate 
how to verify the Assumptions 2.1(a)-(b). It will be seen that the verification of these 
assumptions is easy, though, at first glance, it may seem cumbersome. We will use the 
notion of Jordan decomposition ip = ip(c) + ip£ — centered at some point c£R. For 
the reader's convenience, we have recalled the essentials in the Appendix A. 

Example 3.1 (Gini's mean difference). If g(x\,x 2 ) = \x\ — x 2 \ and F has a fi- 
nite first moment, then U(F) equals Gini's mean difference E[|Ai — X 2 \] of two i.i.d. 
random variables X\ and X 2 on some probability space (fi,.F, P) with d.f. F. Then 
the Assumptions 2.1(a)-(b) are fulfilled for A' = 1. Indeed: We have g X2 (xi) = (x± — 
x 2 )lr X2yO0 } (xi) — (xi — X2)^-[- a0)X2 \{xi) ) so that the first part of Assumption 2.1(a) 
obviously holds. Further, the Jordan decomposition (18) of g X2 centered at c = x 2 
reads as g X2 {xi) = + g- X2 + (x x ) - g X2 ~ 2 (xi), where g X2 + 2 (xi) = {xt - a; 2 )l( X2i00 ](xi) 
and g X2X2 (xi) = (xi — X2)'i-[- O0 ^ X2 ]{xi), and so, in view of Lemma A.l, dp+,(.Ti) = 
l(a:2, 00] (^i) dxi and dg X2 {x\) — t[_ OOX2 ^(xi) dx\. Now it can be seen easily that also 
the second part of Assumption 2.1(a) holds; we omit the details. Let us now turn to 
Assumption 2.1(b). We have 



9i,f(xi) 



E[X 2 1 (X1)00] (X 2 )} - Xl F[X 2 > X!] + Xl P[X 2 < Xi] - E[X 2 t hoo , Xl] (X 2 )} 
x^Fixx) - 1) - E[X 2 ] + 2E[X 2 t (xi , oo] (X 2 )\ 
K + x 1 +2(-x 1 (l-F{x 1 ))+E[X 2 ± ixu0o] (X 2 )]) 




with K := — £[^2]. The same representation holds for <?2,f- So we obviously have gi^F = 
giTF € U_i n BVi oc for i = 1, 2. Moreover, we have g\ F (x) = 2F(x) — 1, and so there is 
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some constant c€l such that t/^F is nonincreasing on (— oo,c) and is nondecreasing on 
(c,oo), for i = 1,2. Since the density of |d<7i^| on (— oo,c) and the density of Idc^^l on 
(c,oo) are bounded, we also have J (f)^\(x)\dgi^\(x) < oo for i = 1,2 and every A > 1. 
That is, all parts of Assumption 2.1(b) hold true. Thus, Assumptions 2.1(a)-(b) hold true. 

If also Assumptions 2.1(d)-(e) hold true, then we obtain from Theorem 2.3 for the 
kernel g(x 1 ,x 2 ) = |xi - x 2 | that U(B°) = 2 J B°(x)(l - 2F(x))dx, because dg liF (x) = 
dg 2 , F (x) = (2F(x)-l)dx. 

Example 3.2 (Variance) . If g(xi, X2) = \ { x i ~x 2 ) 2 and F has a finite second moment, 
then U(F) equals the variance of F. In this case, the Assumptions 2.1(a)-(b) are fulfilled 
for A' = 2. The verification of this is even easier than the elaborations in Example 3.1. We 
note that this time, we obtain dg^ 2 {x\) — (x\ — X2)l( X2l00 ] (xi) dx\ and dg~ 2 (x\) = (x 2 — 
xi)l[-oo,x 2 ](xi)dxi as well as dgf F (xi) = (x* - EfXj])!^^],^] (x$) dx t and d,g t " F (xi) = 
(E[X,-] -x i )l[_ 00iE[ x J ]](xi)dx i for € {1,2} withi^j. 

If also Assumptions 2.1(d)-(e) hold true, then we obtain from Theorem 2.3 for the 
kernel g{x ll x 2 ) = |(xi - x 2 ) 2 that U(B°) = 2 J B°(x)(E[Xi] -x) dx, because d^i^x) = 
dg 2 .F(x) = (x-E[X x ])dx. 

3.2. Examples for F n 

Here we will give some examples for estimators F„ for F that satisfy Assumption 2.1(d)-(e) 
We first consider the case of i.i.d. data. 

Example 3.3 (Empirical d.f. of i.i.d. data). Let Xi,X2,.-. be a sequence of i.i.d. 
random variables with d.f. F, and let A > 0. If F has a finite ^-moment for some 7 > 2A, 
then Theorem 6.2.1 in [26] shows that for the empirical d.f. F n of X\, . . . ,X n , 

Vn(F n — F)Ab° f (in(© A ,P A ,||.|U)), (12) 

where B° F is an F-Brownian bridge, that is, a centered Gaussian process with covariance 
function T(x i y) = F(x A y)F(x V y). Thus, if A > 0, if F has a finite 7-moment for 
some 7 > 2A, and if g is a kernel satisfying Assumptions 2.1(a)-(b) for F and some 
A' e [0, A), then Theorem 2.3 shows that the law of y/n(U (F n ) — U(F)) converges weakly 
to the normal distribution with mean and variance given by (10) with T(x, y) = F{x A 
y)F(xV y). Alternatively, the result can be stated as follows: If g is a fixed kernel and 
E s .v denotes the class of all d.f. F for which Assumptions 2.1(a)-(b) hold with A' > 0, 
then ^/ri(U(F n ) — U(F)) converges weakly to the above mentioned normal distribution 
for every F £ F SjA ' having a finite 7-moment for some 7 > 2A'. Indeed: In this case, we 
can choose A € (A', 7/2) in Assumption 2.1(e). 

Example 3-4 (Smoothed empirical d.f. of i.i.d. data). Suppose that in the setting 
of Example 3.3 the empirical d.f. F n is smoothed out by the heat kernel p e „(") with 
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bandwidth e„ > 0, that is, that F n is replaced by P e „F n with (P e ) e >o the heat semigroup 
(i.e., P e tp := J M tp(y)p e (- — y)dy for e > 0, and Po := I). Then, if F is also Lipschitz 

continuous and ^le^~ X)/{2l) -> 0, the CLT (12) (with F„ replaced by P e „F„) still holds 
(cf. Corollary A. 2 in [2]), and therefore the weak limit of the law of y/n(U (P eri F n ) — U(F)) 
is still the normal distribution with mean and variance given by (10) with T(x,y) = 
F(x A y)F(x V y). Of course, at this point we have to ensure that under the imposed 
assumptions the expression U(P £n F n ) is well defined, that is, that Assumption 2.1(d) is 
satisfied. Now, it can be easily deduced from Lemma 3.2 in [32] that in our setting P En F n 
lies in H)\. Thus, if we assume that, for example, sup^ X2g R \g(xi,X2)\4>-\' (xi)<fi-y (X2) < 
00, Assumption 2.1(d) follows easily. 

Let us now turn to the case of dependent data, which is our actual objective. 
Throughout the examples presented below, we consider a strictly stationary sequence 
(Xi) = (Xi)i>i of random variables on some probability space (fi, F, P) with continuous 
d.f. F, and let as before F„ denote the corresponding empirical d.f. at stage n. By strict 
stationarity, we mean that the joint distribution of Xj+i, . . . ,Xi +m does not depend on i 
for every fixed positive integer m. We will consider three popular dependency structures 
(a-, f3- and p-mixing) in more detail in Examples 3.5, 3.6, and 3.7, respectively. There, 
we will also provide a comparison of the results obtained by the approach considered here 
and the results obtained up to now. For the definition of a-, (3- and p- mixing (and other) 
mixing conditions and for examples of strictly stationary a-, (3- and p- mixing sequences 
see, for example, [3, 11, 17]. As usual, the corresponding mixing coefficients will be re- 
ferred to as a(n), /3(n) and p{n), respectively. The application of our method to other 
dependence concepts will be discussed in Example 3.8. Notice that the condition of a- 
mixing is weaker than the condition of /3-mixing (absolute regularity) under which CLTs 
for U-statistics have been established in [8, 31]. A CLT for strictly stationary a-mixing 
(strongly mixing) sequences of random variables has been given in [6]. 

Example 3.5 (Empirical d.f. of a-mixing data). Let (Xi) be a-mixing with a(n) = 
0(n~ e ) for some 9 > 1 + a/2, and let A > 0. If F has a finite 7-moment for some 7 > 
then it can easily be deduced from Theorem 2.2 in [24] that 

JTi{F n -F)AB° F (in(D A ,X> A> ||.|| A )) (13) 
with Bp a continuous centered Gaussian process with covariancc function 
T(s,t) = F(s At)F(sV t) 

(14) 

OO 

+ ^[Cov(l {Xl < s} , t {Xk <t}) + Cov(l {Xl < t} , l {Xk <s})} 

k=2 

(cf. Section 3.3 in [2]). Thus, if g is a fixed kernel and F ffjA / denotes the class of all 
d.f. satisfying Assumptions 2.1(a)-(b) for some A' > 0, then Theorem 2.3 shows that 
the law of ^n(U(F„) — U(F)) converges weakly to the normal distribution with mean 



10 



E. Beutner and H. Zdhle 



and variance given by (10), with T as in (14), for every d.f. F G ^ g .y having a finite 
7- moment for some 7 > . Indeed: In this case we can choose A G (A' ,j(9 — l)/(26*)) 
in Assumption 2.1(e). 

To compare our result with that of Theorem 1.8 in [6], we consider the kernel 
g(xi, X2) = \{x\ — X2) 2 ■ For Theorem 1.8 in [6] to be applicable, we must assume that F 
has a finite 7-moment for some 7 > 4 (the same condition is necessary to ensure that 
the approach considered here works). In this case, both integrability conditions in Theo- 
rem 1.8 in [6] arc fulfilled, and the condition on the mixing coefficients reads as follows: 
a(n) = 0(n~ e ) for some 9 > § + i + ^ + 7(7 2 _ 4 ) = On the other hand, if F has 

a finite 7-moment for some 7 > 4, in our setting we may choose A' = 2, and so 6 > 

(and AG (2,2^i!)). Hence, our condition on the mixing coefficients reads as follows: 
a(n) = O(n" ) for some 9 > -^j. Notice that > holds for all 7 > 4. Taking 

into account that in our setting, we must choose 9 > 1 + \/2 for the result of [24] to be 
applicable we find that our result relies on a weaker assumption on the mixing coefficients 
than Theorem 1.8 in [6] whenever §^ > 1 + a/2, that is, 7 < ■ 

Example 3.6 (Empirical d.f. of j3-mixing data). Let (Xt) be /3-mixing with /3(n) = 
0(n~ e ) for some 9 > — with k > 1, and let A > 0. If F has a finite 7-moment for some 
7 > 2Ak, then it can easily be deduced from Lemma 4.1 in [4] that the CLT (13) still 
holds and that the covariance function is again given by (14). Thus, if g is a fixed kernel 
and F 9 .v denotes the class of all d.f. satisfying Assumptions 2.1(a)-(b) for some A' > 0, 
then Theorem 2.3 shows that the law of y/n(U(F n ) — U(F)) converges weakly to the 
normal distribution with mean and variance given by (10), with T as in (14), for every 
d.f. F G F g .A' having a finite 7-momcnt for some 7 > 2A'k. Indeed: In this case we can 
choose A G (A', t^-). 

To compare our result with that of Theorem 3.1 in [31] (see also Theorem 1.8 in [6]), 
we consider the kernel g(xi,X2) = \{x\ — X2) 2 ■ For this theorem to be applicable, we 
must again assume that F has a finite 7-moment for some 7 > 4 (the same condition is 
again necessary to ensure that the approach considered here works). In this case, both 
integrability conditions in Theorem 3.1 in [31] (see also Theorem 1.8 in [6]) are fulfilled, 
and the condition on the mixing coefficients reads as follows: /3(n) = (D(n~ e ) for some 
9 > —^4- On the other hand, if F has a finite 7-moment for some 7 > 4, in our setting 
we may choose A' = 2, and so k < 7/4 (and A G (2, j^))- Hence, in view of 9 > — zr, our 
condition on the mixing coefficients reads as follows: /3(n) = (D(n~ e ) for some 9 > ^34- 
That is, both results impose the same condition on the mixing coefficients. 

Example 3.7 (Empirical d.f. of p-mixing data). Let (Xi) be p- mixing with 
Er=i/°( 2 ") <°°' suppose J2T=2 |Cov(l {Xl < s} ,l {Xfc < t} ) + Cov(l {Xl < t} ,l {Xfc < s} )| < 00, 
and let A > 0. If F has a finite 7-moment for some 7 > A(2 + e) with e > 0, then it 
can easily be deduced from Theorem 2.3 in [24] that the CLT (13) still holds and that 
the covariance function is again given by (14) (cf. Section 3.3 in [2]). Hence, we again 
have in this case: If g is a fixed kernel and if we denote by F 9i >,/ the class of all d.f. for 
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which Assumptions 2.1(a)-(b) hold for some A' > 0, then Theorem 2.3 yields that the 
law of \Jn{U (F n ) — U (F)) converges weakly to the normal distribution with mean and 
variance given by (10) with T as in (14) for every F e F Si a' having a finite 7-moment for 
some 7 > A'(2 + e). Indeed: In this case, we can choose A € (A', 7/(2 + e)). 

Up to our best knowledge, the asymptotic distribution of U- and V-statistics of p- 
mixing data has not been studied explicitly so far. Of course, every p-mixing sequence is 
also a-mixing (since a(n) < \p{n); see [3], Inequality (1.12)), but the condition on the 
mixing coefficients imposed in Example 3.7 is considerably weaker than the condition 
on the mixing coefficients imposed in Example 3.5. Similar statements apply to further 
dependence concepts, and one also obtains that further dependence concepts are also 
covered by our approach. 

Example 3.8 (Further examples). Recently, a new dependence structure for se- 
quences of random variables was introduced in [30]. Thus, not surprising, limit distribu- 
tions for U- and V-statistics under this dependence concepts have not been derived so 
far. Anyhow, in [30] it was also proved that, subject to certain conditions, the weighted 
empirical process y/n(F n — F)0 7 converges weakly to a tight Gaussian process. Here F n 
is the empirical d.f. based on a sequence of random variables fulfilling this dependence 
condition. From our Theorem 2.3 one can thus (along the lines of Examples 3.5, 3.6, 
and 3.7) derive the limit distribution of U- and V-statistics when the data fulfills the 
dependence structure in [30]. We omit the details. 

In [10], the limit distribution of U-statistics for associated sequences was derived using 
the Hoeffding decomposition. To prove asymptotic normality of U-statistics for stationary 
and associated sequences, it was required there that the partial derivatives of g arc 
uniformly bounded. This clearly excludes the variance of a random variable. On the other 
hand, our approach also covers the variance for the case of stationary and associated 
sequences. Indeed: Let (Xi) be a stationary, associated sequence with Cov(Xi,X n ) = 
0{rT v ~ £ ) for some v > (3 + v / 33)/2 and e > 0. Then, we can deduce from Theorem 2.4 
in [24] that the CLT (13) still holds and the covariance function is again given by (14) 
whenever F has a finite 7-moment for some 7 > |^ (A > fixed). Hence, we obtain from 
Theorem 2.3 (recall from Example 3.2 that Assumptions 2.1(a)-(b) are fulfilled for the 
variance with A' = 2) that the variance is included in our method of proof whenever F 
has a finite 7-moment for some 7 > in this case we can choose A G (2, 7^— 3)/(2i/)). 

4. Quasi-Hadamard differentiability of U 

This section is concerned with the quasi-Hadamard differentiability (in the sense of Def- 
inition 2.1 in [2]) of the functional U defined in (8). Recall that quasi-Hadamard differ- 
entiability is needed in the proof of Theorem 2.3. Recall also that BVi i( j is the space of 
all cadlag functions in BVi oc with variation bounded by 1, and that U is the class of all 
nonnegative and nondecreasing functions / E BVi.d for which the integral on the right- 
hand side of equation (8) and the integral J <fiy(x) df(x) exist. Moreover, we let BVi OCi d 
be the space of all cadlag functions in BVi oc - 
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Lemma 4.1. Under Assumptions 2.1(a)— (c) (the continuity of F is actually superfluous 
at this point), the functional U defined in (8) is quasi- Hadamard differ entiable at f := F 
tangentially to Ca^a) with quasi- Hadamard derivative given by Uf defined in (9) with 
f:=F. 



Proof. To prove the claim, we have to show that 

U(f + h n v n )-U(f) 



lim 

n— >-oo 



Uf(v) 







(15) 



holds for each triplet (v, (v n ), (h n )) with v e Ca, (v n ) C Ba satisfying / + h n v n e U (for 
all n £ N) as well as \\v n — v\\\ —> 0, and (h n ) cRo :~M. \ {0} satisfying h n —> 0. Let 
/„ := / + h n v n . We stress the fact that /„ lies in U which is a subset of BVi i( j, and that 
consequently h n v n is the difference of two functions which both lie in U (notice that / 
lies in U by Assumption 2.1(c)). For the verification of (15), we now proceed in two steps. 

Step 1. To justify the analysis in Step 2 below, we first of all show that the three 
integrals 



ISi,/l(zi)|du„|(a;i), 



\g2,f\(x2)\dv n \(x 2 ), 



\g(x 1 ,X2)\\dv n \(xi)\dv n \(x 2 ) 



are finite for all n € N. For the finiteness of these integrals, it suffices to show that for 
every n 6 N 



\g(xi,X2)\df n (xi)df(x 2 ) <oo and 



\g(x 1 ,x 2 )\df(x 1 )df n (x 2 ) <co, (16) 



since \gij\ < J \g(-,x 2 )\df(x 2 ) and \g 2 j\ < J \g(x 1 ,-)\df(x 1 ), since h n \dv n \ = df n + d/, 
and since /,/„eU implies 



\g(xi,x 2 )\df(x 1 )df(x 2 ) < oo and j J \g(xx,x 2 )\ df n (xt) df n (x 2 ) < oo. 
(Notice that (16) by itself is also needed in Step 2 below.) We clearly have 
\g(x 1 ,x 2 )\df(xi) df n (x 2 ) < \\g^j\\-x' / 4>\>(x 2 ) df n (x 2 ). 



From the second part of Assumption 2.1(b) we have ||<72,/||-A' <oo,and J (f>\i (a^) df n (x2) < 
oo holds since /„ S U. That is, ||52j1|-A' / 4 > X'( x 2)df n (x2) < oo. Similar arguments show 
that the first inequality in (16) holds. 

Step 2. By Step 1 and the triangular inequality we have 



Uf(v) 



U(f + h n v n )-U(f) 



- i v(xi)dgij(xi) - I v(x2)dg2j(x 2 ) 
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(17) 



2 

i=l 



J^yJ J 9(^1, x 2 ) d(/ + h n v n )(xi)d(f + h n v n )(x 2 ) 
g{xi,x 2 )df{xi)df{x 2 ) 

i{x i )dg i j{x i ) - J gij(xi)dv n (xi) 

g(xi,x 2 ) dv n (xi ) dv n (x 2 ) 

=:^5i i j(n) + S 2 (n). 

i=l 

In order to show that Si t i(n) converges to zero, we will apply the intcgration-by-parts 
formula (22) to / <7i,/(xi) du n (xi). At first, we have to make clear that formula (22) can 
be applied, that is, that the assumptions of Lemma B.l are fulfilled. 

It follows from Step 1 that the second condition in (21) holds true (where gxj and v n 
play the roles of u and v, resp.). Moreover, by the continuity of <f>-\ we have 



K(xi-)||dffij|(xi) 



\v n (xi-)<^A(xi-)(/)_A(xi-)||d3i i /|(Xi) 



\v n {xi-)4>\(xi-)\4>-x(xi)\dgij\(xi) 



<\\ v n\\\ J 0-A(xi)|d3i i/ |(xi). 

By Assumption 2.1(b) and the fact that !) n £Dj, the latter bound is finite, so that also 
the first condition in (21) holds true. We finally note that lim| xi |^. 00 v n {x\)gxj{x{) = 0. 
Indeed: On one hand, \gij{x\)4>-x'{xi)\ is bounded above uniformly in x\ by Assump- 
tion 2.1(b) and Remark 2.2(b)'. On the other hand, \v n (xi)(f>x'{xi)\ converges to as 
|xi| — > oo because |i> n (xi)</>.\(xi)| is bounded above uniformly in x\ (recall A > A'). That 
is, the assumptions of Lemma B.l are indeed fulfilled. 

Now, we may apply the integration-by-parts formula (22) to obtain 



S\,i(n) = 



< 



- / v(xi)dgij(xi) + / v n (xi-)dgij(xi) 



(v n - v)(x 1 )dg 1 j(xi) 



(v n (xi—) - u„(xi))dffi i /(xi) 



< {\\v n -v\\x + \\v n - v\\ x + \\v - v n \\x) / <j)-\(xi)\dgij\(xi) 
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The latter bound converges to zero by Assumption 2.1(b) and \\v — v n \\\ — > 0. That is, 
Si.i(n) — > 0. In the same way we obtain Si 2(n) — > 0. 

Thus, it remains to show S2(n) — > 0. We will apply the intcgration-by-parts for- 
mula (22) to the inner integral in S2(n). So at first we will verify that formula (22) 
can be used, that is, that the assumptions of Lemma B.l are fulfilled. By Assump- 
tion 2.1(a), we have g X2 £ BVi oc .d, and as mentioned above we also have v n G BVi OCi d- 
Further, the integrals J g{x\, X2) df{x\) and J g(x\, X2) df n (xi) exist by the fact that 
fmf G U and Fubini's theorem. This and the representation v n = (/„ — f)/h n imply 
J \g X2 (xi)\\dv n \(xi) < 00, that is, that the second condition in (21) holds true. Moreover, 
by the continuity of <f>-\ we have as above 



v n (xi-)\\dg X2 \(xx) = / |w„(£i-)0A(£i-)^-A(zi-)||d# a2 |(:Ei) 



|u n (a;i-)0A(xi-)|</)-A(xi)||dg 2 : 2 |(xi) 



< \\v n \\\ J <? !, -A( a; l)l d ^2l( a; l)- 

By Assumption 2.1(a) and the fact that v n £ V)\, this bound is finite, so that also the first 
condition in (21) holds true. We finally note that limi^i^^ v n {xi)g X2 (xi) = 0. Indeed: 
On one hand, \g X2 {x\ )4>-\> (xi)| is bounded above uniformly in x\ by Assumption 2.1(a). 
On the other hand, \v n (xi)(j)\i (xi)| converges to as |a;i| — > 00 since \v n (xi)<fi\(xi)\ is 
bounded above uniformly in x\ (recall A > A'). That is, the assumptions of Lemma B.l 
are indeed fulfilled. 

Now, we may apply the integration-by-parts formula (22) to the inner integral in S2{n) 
to obtain 



S 2 (n) 



< 



v n (xi-) dg X2 {xi) d(f n - f)(x 2 ) 



(v n (xi-) - v(x 1 -))dg X2 (x 1 )d(f n - /)(x 2 ) 



)(x 1 -)dg X2 (x 1 )d(f n - f)(x 2 ) 



Since /„ and / generate positive (probability) measures, and v and 4>-y are continuous, 
we may continue with 

< \\v n -v\\\ / ( / (j)-\(xi)\dg X2 \(xi)<j)-\-(x2) )4>\'(x2)df n {x2) 



+ \\v n -v\\ x / / cj>-\{xi)\dg X2 \{x 1 )(j)-x>{x2))<t>\'{x2)df{x2) 



v{x 1 )dg X2 {x 1 ) df n (x 2 ) - 



vix^dg^ixi) df(x 2 ) 
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< \\v n -v\\x / C(/)y{x 2 )df n (x 2 ) + \\v n - v\\x / C(/)y(x 2 )df(x 2 ) 



v{x 1 )dg X2 {x 1 ) df n (x 2 ) - 



v(xi)dg X2 (xi) df(x 2 ) 



S 2 ,i{n) + S 2 , 2 (n) + S 2 , 3 (n) 



with C :=sxx ] p X2 j 4>-\(xi)\dg X2 \(xi)4>-\i(x 2 ) (which is finite by the second part of As- 
sumption 2.1(a)). By Lemma B.2, which can be applied due to Assumption 2.1(a), and 
the facts that we Da, \\f„ — /|| \ — > 0, and that / </>a' (x 2 ) df(x 2 ) and / <fr\> (x 2 ) df n (x 2 ) ex- 
ist, the summand S 2t3 (n) converges to 0. Since \\v n — v\\ \ — > 0, and since / <fi\> (x 2 ) df(x 2 ) 
is finite because / € U, we also obtain S 2 , 2 (n) — > 0. It remains to show S 2} i(n) — > 0. As 
\\v n — v\\ x — > 0, it suffices to show that J 4>y (x 2 ) df n {x 2 ) is uniformly bounded from above. 
The latter follows from the finiteness of J 4>\>(x 2 )df(x 2 ) and Lemma B.2 which is appli- 
cable since we clearly have <f)\i E D-a' , and for every n € N the integral J 4>\> (x 2 ) df n (x 2 ) 
exists due to f n G U. This proves the claim of Lemma 4.1. □ 



Remark We note that the proof of Lemma 4.1 basically applies also to V- 

functionals of the shape U(F) = / ■■■/ g(xi, ■ ■ ■ , Xd) dF(x\) ■ ■ ■ dF(xd) with arbitrary 
d> 2, provided Assumptions 2.1(a)-(b) (which ensure the quasi-Hadamard differentia- 
bility of U in the case d = 2) are modified suitably and the definition of Uf in (9) is re- 
placed by Uf(v) := ~Yli=iJ v ( x )dgij(x) with gij{xi) :=/•••/ g{x\, . . . , x d ) d/(xi) • • • 
df{xi-\) df{xi+i) ■ ■ ■df{x ( i). In particular, Theorem 2.3 then still holds for such general 
V-functionals. Let us exemplify the validity of the analogue of Lemma 4.1 for the case 
d = 3. To do so, we let M(> ^) be the space of all measurable functions h :R 2 — > R such 
that sup a , 1 X2 \h{xi,x 2 )4>\(xi)4>\(x 2 )\ is finite. To ensure the existence of the integrals as 
in Step 1 in the above proof, it is sufficient to require that the functions g~ijj(xi, Xj) := 
/ \g(xi,X2,x 3 )\df(x k ), i,j,k E {1,2,3}, i <j, i,k^j, are in M(_ A ',_A')> and tnat 
the functions gjj(xi) := J \g(x±,X2, £3)! df(xj) df(xk), i,j,k(z {1,2,3} pairwisc disjoint, 
lie in D_a' (cf. the second part of Assumption 2.1(b)). Then Step 1 still holds. Let us 
turn to Step 2 in the above proof. In (17), we now obtain the bound 



i=i 



Si(n) + S 2 (n) + S 3 (n) :='Y2 ~ J v{xi) dgij(xi) - J g i j(x l ) dv n (xi) 

g itj j (xi ,xj) dv n (xi ) dv„ (xj ) 



K 

i,j=l:i<j 



g(xi,x 2 ,x 3 ) dv n (xi) dv n (x 2 ) dv n (x 3 ) 



where 9i,j,f(xi,Xj) := J g(x 1 ,x 2 ,x 3 )df(x k ), i,j,k£ {1,2,3}, i <j, k^i,k^j. To obtain 
S±(n) — ¥ 0, it suffices to assume that the functions gij satisfy the first part of Assump- 
tion 2.1(b). To ensure that h~ 1 S 2 (n) is bounded above, it suffices to assume that, similar 
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to the case d = 2, the functions gijj satisfy Assumption 2.1(a) (with g replaced by 
g%,j,f)- Assuming that for every fixed X2,x^ the function g X2 ,x 3 (') : = ^2, £3), lies in 
BVioc nB_A', and that (x 2 ,x 3 )^ J c/)-x(xi)\dg X2>X3 \(xi) lies in M ( _ A , : _ V) (cf. Assump- 
tion 2.1(a)), ensures that h~ 2 S 3 {n) is bounded above. Thus, Si(n) + ^(n) + S 3 (n) — > 0. 

Finally, we note that the case d = 1 is even easier. Here, we only need to assume 
g £ BVioc n D-a' (instead of Assumptions 2.1(a)-(b)) and to replace (9) by Uf(v) := 
-Jv(x)dg(x). 

Appendix A: Jordan decomposition of functions in 

BVioc 

Recall that for ip £ BVi oc and c £ R, the Jordan decomposition of i\) centered at c, 

^ = if>{c)+i>+-^, (18) 

is characterized as follows: ip~£ and tp~ are the unique nondecreasing functions satisfying 

i>+(x)=V+([c,xU), ip-(x)=V-([c,xU) Vx>c, (19) 

1>+(x) = -V+([x,c],il)), 1>-(x) = -V-([x,c],1>) Vx<c, (20) 

where V + ([a,b],ip) and V~ ([a,b],ip) denote the positive and the negative variation of ip 
on the interval [a, b], respectively. For details see, for example, [15], page 34. In our 
applications, we are mainly concerned with the positive measures dip^ and dip~ induced 
by ip+ and ip~ , respectively (provided ip£ and ip~ are right-continuous). The following 
lemma shows that dip£ and dtp~ are independent of c, although ip£ and tp~ typically 
do depend on c. In particular, the definition |d"0| := dij)J + dip~ of the absolute value 
measure \dip\ is independent of c. 

Lemma A.l. Let ip £ BVi oc and cel. Then ip+ , tp~ differ from xp^ , ipQ only by 
constants , K~ , respectively. In particular, the positive measures dtp^ and dtp~ are 
independent of c. 

Proof. Let c> 0. Then, in view of (19)-(20), we have 

V>+ (x) = V + ([0, x] , V) = V + ( [0, c] , V) + V+ ( [c, x) , VO = V+ ([0, c] , i>) + (x) 

for x £ (c, 00), and similar we obtain tpa(x) = V + ([0, c],ip)+tp^(x) for the cases x £ [0, c] 
and x £ (—00, 0). That is, ip£ = ipQ + A'+ for some constant . Analogously, we obtain 
i't — ^0 + f° r c — 0j an d — i^o + Ke for c < as well as c> 0. □ 

Appendix B: Integration theoretical auxiliaries 

Recall our convention J = J,^ ^ and that BVi oc .d denotes the space of all cadlag 
functions in BVi oc - 
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Lemma B.l. Let u, v £ BVi oc .d such that \im x ^± 00 u(x)v(x) — c± for some constants 
c_|_,c_ gR. Then, if 



(21) 



|?;(x— )||du|(x) < oo and j \u(x)\\dv\(x) < oo, 
we have the integration-by-parts formula 

u(x) dv(x) = c+ — c_ — / v(x—)du(x). 



(22) 



Proof. If — oo < a < b < oo, then one can proceed as in the proof of Theorem II. 6. 11 
in [25] to obtain 



u{x) dv{x) = u(b)v(b) — u(a)v(a) — / v(x— ) du(x), 

(a,b] J(a,b] 



(23) 



because J, ^ \v(x— )||du|(x) < oo and f, b , \u(x)\\dv\(x) < oo. Now, choosing sequences 
(a n ),(b n ) C (—00,00) with a n \. —00 and b n t 00, the statement of the lemma fol- 
lows from (23), the continuity from below of the finite measures J u + (x) dv + (x), 
J u~(x) dv + (x), ... on (—00, 00), and the assumption linxc-^ioo u(x)v(x) = c±. □ 

Next, we give a sort of Helly-Bray theorem. Recall that BVi i( j denotes the space of all 
cadlag functions on R with variation bounded by 1 . 

Lemma B.2. Let A > A' > 0, let ip G D-a' and suppose that f, /i,/2,--- € BVi i( j 
are nondecreasing and satisfy lim„_ i . 00 \\f n — f\\\ = 0. Let J <j)\i(x)df(x) < 00 and 
J <fi\i(x)df n (x) < 00 for every 116N. Then the integrals fip(x)df(x) and J ip{x) df n (x) 
exist and we have 

lim U(x)df n {x)= [ ^(x)df(x). 
Proof. The first claim follows from 

|V>(x)|d/(x) = / \TP(x)<Py(x)0_y(x)\df(x)<U\\^ X , [ 0y(x)df(x) 



and the analogous bound for J \ip(x)\ df n (x), neN. 

Now let us turn to the second claim. Since ip4>-\> is a bounded cadlag function on the 
compact interval R, we may and do choose for each e > a step function tp e £ ID) with 
a finite number of jumps and satisfying \\ijj<f>_x> ~ V'elloo < £■ For ip s := ip £ <f>\> , we thus 
have Wip — ip e \\-x' < £ - Of course, 



V>(x)d/ n (x) - / 4>(x)df(x 



< 



^(X)d(f n -f)(x)- / MxWn~f)(x) 
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lk(s)d(/» -/)(*) 
=: Si(n,e) + S 2 {n,e). 
For the first summand, we obtain 



(24) 



Si(n,e) 



(f>-y(x)(j)y(x)ip(x)d(f n - f)(x) 



<fi-y (x)(/)y (x)lp e (x) d(f n - f)(x) 



< 



4y(x)df n (x)+ / ^(X)df(x) |^-^||_ A 



(25) 



< (J 4>y{x)df n (x) + J <t> x -(x)df(x))e 

< Ce 

for some finite constant C > being independent of n and e. For the last step, we used the 
assumption J <py(x) df(x) < oo and the fact that sup n6N / <fiy (x) df n (x) < oo. The latter 
fact is not completely obvious, so that we give the details: Because of J (j>y (x) df(x) < oo, 
it is clearly sufficient to show that sup ngN | J 4>\'{x)d{f — f n )(x)\ is bounded above by 
some finite constant. By our assumptions and the bound (26) below, we can apply the 
integration by parts formula (22) to the functions / — /„ and (py to obtain 



Mx)d(f-f n ) 



<2||/-/ n |U' + 



(f-fn)(x-)d<t>y(x) 



\(f - fn)(x-)\\d<t>y\(x) = J \Mx)(f - f n )(x-)\<l,-x(x)\64>X'\(x) 
<2\\f-f n \\ X [ <j>- X (x)d<f>y{x). 



By our assumptions, the first summand tends to since \\f n — f\\y < \\f n — f\\\. The 
second summand is less than or equal to J \(f ~ f n )(x—)\\d(f)y \(x) and we have 



(26) 



Since \\f — f n \\\ — > by assumption, and L 4>-\(x) dipy (x) < oo by A > A' > 0, the 
left-hand side of (26) converges to 0. In particular, the left-hand side of (26) is bounded 
above uniformly in n. This completes the proof of (25). 

Now, the second claim of the lemma would follow from (24) and (25) if we could show 
that S2(n,e) converges to as n — > oo uniformly in e € (0, 1]. By our assumptions and 
formula (27) below, we can apply the integration by parts formula (22) to obtain 



1pe(x)(t>\' (x)(j)-y (x) d{f n - f){x) 
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<2|hk||-A'||/«-/IU' + 



(fn-f)(x-)dlP £ (x) 



<2(||^-V||-v + ||^||-v)||/„-/||a' + 



The first summand converges to by our assumptions and \[ij) s — ^H— \> < e < 1. Further- 
more, the second summand is less than or equal to J |(/„ — f)(x—)\\dip e \(x). Recalling 
ip e = ipe<t>\' an d that ipe is a step function with a finite number of jumps, we now obtain 

\{fn-f)(x-)\\ty B \{x) 
<Ue\\ooJ |(/„-/)(aj-)||d^v|(a?) 

= ||^ e ||oo / \(fn - f)(x-)4>x(x)\<l>-x(x)\d^y\(x) 
<2(||^-A'||oo + l)||/n-/|U / ^-x{x)d4> X '{x), 



(27) 



and this expression converges to because \\f n — f\\ \ —> and A > A' > 0. That is, 5*2(n, e) 
indeed converges to as n — > oo uniformly in e G (0, 1]. □ 
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